Journal article
MetaPilot: A DRL-based controller for dynamic adaptation to shifting scheduling objectives in HPC systems
L Wang, MA Rodriguez, N Lipovetzky
Future Generation Computer Systems | Elsevier BV | Published : 2026
Abstract
Efficient job scheduling in high-performance computing (HPC) systems necessitates the simultaneous consideration of system-centric objectives, such as maximizing resource utilization, and user-centric objectives, such as minimizing job waiting times. In practice, the relative importance of these objectives is not static, but shifts dynamically in response to fluctuations in workload characteristics and system state. However, existing scheduling frameworks - including both traditional workload managers and reinforcement learning (RL)-based methods - typically rely on fixed policies or fixed reward functions that encode a predetermined combination of objectives. As a result, they lack the flex..
View full abstract